Automatic Thesaurus Extraction for Thai Text Retrieval Enhancement
نویسندگان
چکیده
Thesaurus is one of the most important components for information retrieval (IR) systems. A thesaurus provides a precise and controlled vocabulary that serves to coordinate document indexing and retrieval then it improves the retrieval effectiveness. However the major problem with the manual thesaurus is a laborintensive task and therefore also expensive to build and hard to update in timely manner. Consequently, this paper proposes one approach to construct Thai thesaurus automatically, called a Thai association thesaurus, based on the statistical technique and natural language processing technique.
منابع مشابه
Automatic Thai Ontology Construction and Maintenance System
Ontology is an essential resource to enhance the performance of Information Processing system such as information integration, document classification in taxonomies, including information retrieval and data cleaning in database system. This paper proposes three methodologies for Automatic Thai Ontology Construction and Maintenance from technical corpus, dictionary and thesaurus. For corpus base...
متن کاملCorpus-based terminology extraction applied to information access
This paper presents an application of corpus-based terminology extraction in interactive information retrieval. In this approach, the terminology obtained in an automatic extraction procedure is used, without any manual revision, to provide retrieval indexes and a “browsing by phrases” facility for document accessing in an interactive retrieval search interface. We argue that the combination of...
متن کاملTerminology Retrieval: Towards a Synergy between Thesaurus and Free Text Searching
Multilingual Information Retrieval usually forces a choice between free text indexing or indexing by means of multilingual thesaurus. However, since they share the same objectives, synergy between both approaches is possible. This paper shows a retrieval framework that make use of terminological information in free-text indexing. The Automatic Terminology Extraction task, which is used for thes...
متن کاملAn Enhancement of Thai Text Retrieval Efficiency by Automatic Backward Transliteration
Loan words, which are borrowed from foreign languages, are used in many languages such as Japanese, Chinese, Korean and Thai. They have effects on Thai Text Retrieval (TTR) system leading to inaccurate terms weight for indexing and text clustering. Therefore, there is a need to create automatic backward transliteration that can solve this problem. In this paper, we propose a hybrid model approa...
متن کاملارائه روشی برای استخراج کلمات کلیدی و وزندهی کلمات برای بهبود طبقهبندی متون فارسی
Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000